Make op_upsample_bilinear2d_aa_test deterministic by psiddh · Pull Request #19357 · pytorch/executorch

psiddh · 2026-05-07T00:46:09Z

Summary:
Three test methods in
fbcode/executorch/kernels/portable/test/op_upsample_bilinear2d_aa_test.py
have been auto-disabled as flaky on the test-issues dashboard
(owner ai_infra_mobile_platform):

test_upsample_bilinear2d_aa_aten_parity_u8
test_upsample_bilinear2d_aa_aggressive_downsampling
test_upsample_bilinear2d_aa_align_corners_downsampling

Root cause: each test builds its input via torch.randint(...) or
torch.randn(...) with no seed pinned, so each run sees a different
sample. The configured atol was tight enough that on some draws the
ATen-vs-ExecuTorch divergence (driven by separable-vs-direct
anti-aliased interpolation differences) crossed the threshold and the
test flipped to FAIL. The kernel implementations themselves are not
changing across runs.

Fix:

Add setUp(self): torch.manual_seed(0) so every run sees the same
input tensor and the same divergence, eliminating the run-to-run
FAIL/PASS oscillation.
Bump two atol thresholds to cover the worst-case observed
divergence with the now-pinned input:
- u8 parity: 3.5 -> 5 (observed max abs error 4 / 255)
- aggressive 4x downsampling: 0.4 -> 1.0 (observed max abs error
  ~0.59 for N(0,1) input)
The pre-existing atol=0.25 on align_corners_downsampling is left
unchanged - with seed 0 it now passes consistently.

The relaxed tolerances are still well below any change that would
indicate an actual kernel regression; the comprehensive C++ test
suite in op_upsample_bilinear2d_aa_test.cpp still validates the
kernel under tighter constraints.

Reviewed By: rascani

Differential Revision: D104150928

pytorch-bot · 2026-05-07T00:46:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19357

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 20 Pending

As of commit f2388dd with merge base b3baac5 ():

NEW FAILURE - The following job has failed:

Docathon Labels Sync / check-labels (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-05-07T00:46:18Z

@psiddh has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104150928.

github-actions · 2026-05-07T00:47:03Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR addresses test flakiness in the portable anti-aliased bilinear upsample parity tests by making their randomized inputs deterministic, and updates the iOS LLM Apple BUCK rule’s test timeout/label configuration to accommodate very long-running XCTest inference runs.

Changes:

Seed PyTorch RNG in UpsampleBilinear2dAATest to make randn/randint inputs deterministic across runs.
Relax atol for two specific flaky parity tests to match the observed worst-case divergence under the pinned seed.
Update Apple LLM BUCK test labeling/timeouts to allow significantly longer XCTest execution.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
kernels/portable/test/op_upsample_bilinear2d_aa_test.py	Pins RNG for deterministic inputs and adjusts tolerances to eliminate flakes in ATen-vs-ExecuTorch parity checks.
extension/llm/apple/BUCK	Changes test label and increases overall test rule timeout to support long-running iOS-sim LLM inference tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    def setUp(self) -> None:
+        # Pin RNG so torch.randn / torch.randint inputs are deterministic.
+        # Without this, the parity tests below occasionally see input values
+        # that produce ATen-vs-ExecuTorch differences just above the
+        # configured atol, surfacing as flakes on the test-issues dashboard.
+        torch.manual_seed(0)
+


Summary: Three test methods in `fbcode/executorch/kernels/portable/test/op_upsample_bilinear2d_aa_test.py` have been auto-disabled as flaky on the test-issues dashboard (owner ai_infra_mobile_platform): - test_upsample_bilinear2d_aa_aten_parity_u8 - test_upsample_bilinear2d_aa_aggressive_downsampling - test_upsample_bilinear2d_aa_align_corners_downsampling Root cause: each test builds its input via `torch.randint(...)` or `torch.randn(...)` with no seed pinned, so each run sees a different sample. The configured `atol` was tight enough that on some draws the ATen-vs-ExecuTorch divergence (driven by separable-vs-direct anti-aliased interpolation differences) crossed the threshold and the test flipped to FAIL. The kernel implementations themselves are not changing across runs. Fix: 1. Add `setUp(self): torch.manual_seed(0)` so every run sees the same input tensor and the same divergence, eliminating the run-to-run FAIL/PASS oscillation. 2. Bump two atol thresholds to cover the worst-case observed divergence with the now-pinned input: - u8 parity: 3.5 -> 5 (observed max abs error 4 / 255) - aggressive 4x downsampling: 0.4 -> 1.0 (observed max abs error ~0.59 for N(0,1) input) 3. The pre-existing `atol=0.25` on align_corners_downsampling is left unchanged - with seed 0 it now passes consistently. The relaxed tolerances are still well below any change that would indicate an actual kernel regression; the comprehensive C++ test suite in `op_upsample_bilinear2d_aa_test.cpp` still validates the kernel under tighter constraints. Differential Revision: D104150928

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

psiddh · 2026-05-08T21:38:51Z

@claude review this PR and the comment from Copilot , seems reasonable

claude · 2026-05-08T21:39:50Z

Claude finished @psiddh's task in 1m 3s —— View job

Review of PR #19357

Read the changed file and understand the diff
Evaluate Copilot's suggestion about RNG state isolation
Provide review feedback

Overall: This PR looks good. The fix is straightforward and the approach (pinning the seed + adjusting tolerances) is the standard pattern for making flaky numerical-parity tests deterministic.

On Copilot's suggestion (RNG state save/restore)

Copilot's suggestion to save/restore RNG state via torch.get_rng_state()/set_rng_state() in tearDown() (or use torch.random.fork_rng()) is technically correct but not practically necessary here. Reasons:

This test file runs in its own process — Python unittest discovery typically isolates test modules. Even if it doesn't, the standard unittest runner doesn't guarantee RNG state across classes anyway.
Other tests in this repo also call torch.manual_seed() at module/function scope without restoring state (e.g., kernels/test/test_case_gen.py calls it at module level, backends/test/suite/runner.py calls it per test). This is the established pattern in the codebase.
The added complexity isn't worth it — setUp+tearDown RNG save/restore adds 4+ lines for a scenario that doesn't occur in practice. If a future test in this file needs different randomness, it can simply call torch.manual_seed(N) with a different seed.

So the current approach of just torch.manual_seed(0) in setUp is fine and consistent with the rest of the repo. I'd dismiss the Copilot comment.

Minor observations

The tolerance bumps (3.5→5 for u8, 0.4→1.0 for aggressive downsampling) are well-justified in the comments and the PR description explains the observed error magnitudes.
The comments on lines 26-27 and at the tolerance sites are helpful since they explain why the values were chosen — this is one of the few cases where comments genuinely add value.
The align_corners_downsampling test keeping atol=0.25 unchanged (passes with seed 0) is a good sign that the seed choice isn't just masking broader issues.

LGTM — no blocking concerns.

Summary: Three test methods in `fbcode/executorch/kernels/portable/test/op_upsample_bilinear2d_aa_test.py` have been auto-disabled as flaky on the test-issues dashboard (owner ai_infra_mobile_platform): - test_upsample_bilinear2d_aa_aten_parity_u8 - test_upsample_bilinear2d_aa_aggressive_downsampling - test_upsample_bilinear2d_aa_align_corners_downsampling Root cause: each test builds its input via `torch.randint(...)` or `torch.randn(...)` with no seed pinned, so each run sees a different sample. The configured `atol` was tight enough that on some draws the ATen-vs-ExecuTorch divergence (driven by separable-vs-direct anti-aliased interpolation differences) crossed the threshold and the test flipped to FAIL. The kernel implementations themselves are not changing across runs. Fix: 1. Add `setUp(self): torch.manual_seed(0)` so every run sees the same input tensor and the same divergence, eliminating the run-to-run FAIL/PASS oscillation. 2. Bump two atol thresholds to cover the worst-case observed divergence with the now-pinned input: - u8 parity: 3.5 -> 5 (observed max abs error 4 / 255) - aggressive 4x downsampling: 0.4 -> 1.0 (observed max abs error ~0.59 for N(0,1) input) 3. The pre-existing `atol=0.25` on align_corners_downsampling is left unchanged - with seed 0 it now passes consistently. The relaxed tolerances are still well below any change that would indicate an actual kernel regression; the comprehensive C++ test suite in `op_upsample_bilinear2d_aa_test.cpp` still validates the kernel under tighter constraints. Reviewed By: rascani Differential Revision: D104150928

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

+    def setUp(self) -> None:
+        # Save RNG state so we can restore it in tearDown; without this,
+        # `torch.manual_seed` would leak determinism into other test
+        # modules that share the same process.
+        self._torch_rng_state = torch.get_rng_state()


Copilot AI review requested due to automatic review settings May 7, 2026 00:46

psiddh requested review from larryliu0820, manuelcandales and mergennachin as code owners May 7, 2026 00:46

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2026

meta-codesync Bot added fb-exported meta-exported labels May 7, 2026

Copilot started reviewing on behalf of psiddh May 7, 2026 00:46 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

meta-codesync Bot changed the title ~~Make op_upsample_bilinear2d_aa_test deterministic~~ Make op_upsample_bilinear2d_aa_test deterministic (#19357) May 7, 2026

psiddh force-pushed the export-D104150928 branch from 4fa5cd8 to 12bfb5f Compare May 7, 2026 00:50

meta-codesync Bot changed the title ~~Make op_upsample_bilinear2d_aa_test deterministic (#19357)~~ Make op_upsample_bilinear2d_aa_test deterministic May 7, 2026

Copilot AI review requested due to automatic review settings May 7, 2026 03:34

psiddh force-pushed the export-D104150928 branch 2 times, most recently from bd692d8 to bdfcb80 Compare May 7, 2026 03:34

Copilot started reviewing on behalf of psiddh May 7, 2026 03:35 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

rascani approved these changes May 8, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 8, 2026 22:12

psiddh force-pushed the export-D104150928 branch from aba702a to f2388dd Compare May 8, 2026 22:12

Copilot started reviewing on behalf of psiddh May 8, 2026 22:13 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

psiddh merged commit 9e4e497 into pytorch:main May 8, 2026
177 of 181 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make op_upsample_bilinear2d_aa_test deterministic#19357

Make op_upsample_bilinear2d_aa_test deterministic#19357
psiddh merged 1 commit intopytorch:mainfrom
psiddh:export-D104150928

psiddh commented May 7, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented May 7, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

psiddh commented May 8, 2026

Uh oh!

claude Bot commented May 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

psiddh commented May 7, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19357

❌ 1 New Failure, 20 Pending

Uh oh!

meta-codesync Bot commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

psiddh commented May 8, 2026

Uh oh!

claude Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #19357

On Copilot's suggestion (RNG state save/restore)

Minor observations

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

psiddh commented May 7, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented May 7, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented May 8, 2026 •

edited

Loading